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ABSTRACT 

Advances in electron microscopy (EM) allow for 
structure determination of large biological assem- 
blies at increasingly higher resolutions. A key step 
in this process is fitting multiple component struc- 
tures into an EM-derived density map of their 
assembly. Here, we describe a web server for this 
task. The server takes as input a set of protein 
structures in the PDB format and an EM density 
map in the MRC format. The output is an ensemble 
of models ranked by their quality of fit to the density 
map. The models can be viewed online or down- 
loaded from the website. The service is available 
at; http://salilab.org/multifit/ and http://bioinfo3d. 
cs.tau.ac.il/. 

SIGNIFICANCE 

Macromolecular assemblies are involved in nearly all cel- 
lular processes. Determining the structures of these bio- 
logical machines is crucial for deciphering their function. 
Recent advances established electron microscopy as a central 
technique for studying the structures of macromolecular as- 
semblies in different functional states in vitro and in vivo. 
Because the resolution of an electron microscopy density 
map is relatively low, fitting of atomic resolution component 
structures into the density map of the whole assembly is es- 
sential. MultiFit is the first web server for achieving this task. 

INTRODUCTION 

Recent advances have established electron microscopy 
(EM) as a central technique for studying the structures 
of macromolecular assemblies in different functional 



states in vitro and in vivo (1). The resolution of an EM 
density mapjs typically better than 25 A, and can be as 
high as ~4A for highly symmetric structures (2,3). In 
most cases, however, the resolution is insufficient to con- 
struct a full atomic model of a protein complex. To this 
end, fitting of atomic resolution structures into an EM 
density map of the whole assembly is essential (4-8). 

In the past decade, different algorithms have been de- 
veloped for fitting a single protein subunit into its density 
map (9-20). Most methods use a variant of the cross-cor- 
relation coefficient as the quality-of-fit measure (21). The 
position of a protein subunit inside the density map is 
sampled either exhaustively or by matching precalculated 
geometric features. Methods for fitting multiple compo- 
nents of large assemblies have also been recently described 
(22-25). In particular, we have developed the MultiFit 
module of the Integrative Modeling Platform (IMP, http:// 
www.salilab.org/imp/) software package (23,26). MultiFit 
simultaneously positions protein subunits into a density map 
of a protein assembly by combining geometric criteria com- 
monly used in molecular docking and quality-of-fit 
criteria commonly used in EM fitting. The method was 
validated in the 2010 EM modeling challenge (http:// 
ncmi.bcm.edu/challenge/). 

Here, we present a web interface to MultiFit. The server 
takes as input a set of protein structures in the PDB 
format and an EM density map in the MRC format. 
The output is an ensemble of models ranked by their 
quality of fit to the density map. The models can be 
viewed online or downloaded from the website. 

THE MULTIFIT METHOD 

MultiFit is a method for simultaneously fitting atomic- 
resolution protein structures into their assembly density 
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map at resolutions as low as 25 A. The input is a set of 
atomic structures of proteins and an EM density map of 
their assembly. The component positions and orientations 
are optimized with respect to a scoring function that in- 
cludes the quality-of-fit of components in the map, the pro- 
trusion of components from the map envelope and the 
shape complementarity between pairs of components. 
The scoring function is optimized by an exact inference 
optimizer DOMINO (Discrete Optimization of Multiple 
INteracting Objects) that efficiently finds the global min- 
imum within a discrete sampling space. Specifically, the 
optimization algorithm is composed of four stages, each 
sampling assembly models at increasingly higher reso- 
lution and accuracy. In 'anchor graph segmentation' 
stage, an unlabeled segmentation of the density map into 
regions is calculated using a Gaussian mixture model; the 
segmented regions correspond approximately to the 
subunits in the complex. In 'fitting-based assembly config- 
uration' stage, a set of coarse assembly models is found by 
an enumeration over possible assignments of subunits to 
regions, followed by simultaneous local fitting of the 
subunits in the corresponding regions. In 'docking-based 
pose refinement' stage, each of the models found in the 
'configuration' stage is refined by simultaneous local 



optimization of the interfaces between pairs of interacting 
subunits as sampled by local pairwise docking. In 'rigid 
body minimization' stage, each of the models found in the 
'refinement' stage is further refined using a local Monte 
Carlo/conjugate gradients minimization procedure. The 
default run of the MultiFit web server omits the final re- 
finement stage. Users can explore the ensemble of solu- 
tions generated by the first three stages and then refine a 
subset of the ensemble using a downloaded version of 
MultiFit. For cyclic symmetric complexes, the symmetry 
is imposed within the optimization procedure for improved 
efficiency, such that only symmetric models are sampled. 
In particular, in 'fitting-based assembly configuration' and 
'docking-based pose refinement', only cyclic symmetric 
models consistent with the symmetry of the density map 
are sampled (26). 

WEB SERVER 

Input 

The MultiFit web server requires as input a set of pro- 
tein structures in the PDB format, an EM density map 
of their assembly in the MRC format, and a few 
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Figure 1. Snapshots of the MultiFit web server. (A) Input page. The inputs are divided into three parts: (i) general information, (ii) density map 
information and (iii) protein complex information. Seven copies of the GroEL chaperon monomer [PDB entry loel (33)] are simultaneously fitted to 
its ring density map at 11.5-A resolution [EMDB entry 1080 (34)] using cyclic symmetry mode. The input subunit PDB file and the input assembly 
density map used for this example can be obtained from the MultiFit web server help page. The input parameters for the resolution, spacing, contour 
level and symmetry order obtained from EMDB site are 11.5, 2.7, 0.852 and 7, respectively. The optional parameters for X, Y and Z origins are set 
to -50, -50 and -50, respectively. (B) Output page. The top 20 assembly models of the GroEL chaperon complex are ranked according to the 
quality-of-fit score from top left to bottom right. The user can click on the model thumbnail to open it using Chimera for further analysis. The PDB 
files and the transformation output file can be downloaded. Job results will be available for 6 days. (C) Top scored structural model of the GroEL 
ring fitted into the density map. 
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parameters (Figure 1). The parameters for the density map 
include: (i) resolution (A) (27); (ii) voxel spacing on the 
grid representing the map (A); and (iii) the contour level 
that results in the volume accommodating the molecular 
mass of the complex. These parameters are included for 
maps deposited in the EM Data Bank (EMDB) (28). 

The MultiFit web server operates in two modes: cyclic- 
symmetric and non-symmetric. In the cyclic-symmetric 
mode, the symmetry order should be provided (2 for 
dimer, 3 for trimer, etc.). If the arrangement of the input 
monomers in its native complex follows a different type of 
symmetry, the user should use the downloaded version of 
MultiFit. In the non-symmetric mode, a list of subunit 
PDB files and the number of copies of each subunit are 
required. The input density should be pre-segmented to 
contain only the input set of proteins. 

The server also has an optional input parameter spe- 
cifying an e-mail address to which a link to the results 
page will be sent once the job is completed. 
Alternatively, the user can bookmark a web link to the 
results page at the time of data submission. The status of 
the job (queued, running or finished) can be accessed on 
the queue page. 



Output 

The computation is performed in real time and the server 
page is updated once the calculation has finished. The 
typical running time is about 20min for assemblies with 
tens of thousands of atoms. The web server output page 
displays a table of the top 20 assembly models that best fit 
the assembly density map, along with their quality-of-fit 
scores ranked from top left to bottom right (Figure 1). 
MultiFit lists the optimal as well as suboptimal solutions; 
when the latter have good scores and are different from 
the optimal solution, the user should be skeptical about all 
solutions and further analyze the ensemble. 

Each model can be saved as a PDB file and can also 
be directly opened with UCSF Chimera (19). A com- 
pressed file containing all models is available for down- 
load. Moreover, the MultiFit output text file can be 
downloaded. Row i lists the transformation applied to 
each of the subunits, the model quality-of-fit score, 
and the geometric complementarity score for model i. 
This output file can be used as input to IMP for further 
refinement and analysis. It can also be used as input 
for refining symmetric complexes using the SymmRef 
method (29). 



CONCLUSIONS 

With the growing number of macromolecular assemblies 
characterized by EM, integrative modeling techniques are 
becoming increasingly useful for a mechanistic under- 
standing of these assemblies (6,30-32). The MultiFit web 
server was designed to provide a user-friendly web inter- 
face to the MultiFit module in the IMP package, for 
fitting multiple protein structures into their assembly 
density map. 
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